Introduction

Why I chose this topic?

Problems with Obesity, specifically child obesity

There is concern about the rise of childhood obesity and the implications of such obesity persisting into adulthood. The risk of obesity in adulthood and risk of future obesity-related ill health are greater as children get older. Studies tracking child obesity into adulthood have found that the probability of overweight and obese children becoming overweight or obese adults increases with age. The health consequences of childhood obesity include: increased blood lipids, glucose intolerance, Type 2 diabetes, hypertension, increases in liver enzymes associated with fatty liver, exacerbation of conditions such as asthma and psychological problems such as social isolation, low self-esteem, teasing and bullying.

Source: https://fingertips.phe.org.uk/profile/national-child-measurement-programme/data#page/6/gid/8000011/pat/6/par/E12000004/ati/101/are/E07000032/iid/93458/age/201/sex/4

Aims

The aims of this project are:

  • Identify the most influencing factors for childhood obesity for children in in Year 6 (age 10/11 in 2017/18)

    • How do these vary by local area?

Childhood obesity in 2017/18 ranges from 11.37% in Richmond upon Thames to 29.66% in Barking and Dagenham.

  • Determine if different factors effect the prevalence of obesity at Reception for the same cohort of children (age 5/6 in 2010/11) within the same local area

    • Are these different and does this explain the increase in childhood obesity from Reception to Year 6?

In 2010/11, the obesity levels for Reception pupils was 9.44%. In 2017/18, when the same cohort of pupils were in Year 6, the measured levels of childhood obeisty had increased to 20.14%

I have chosen to do this on local authority level as this is gives 152 different areas to model whilst ensuring there is still ample data available at this denomination. Local authorities are also responsible for local schools and allocating spending within the area so give a good variation of different spending allocations/policies the findings could be used to explore.

Structure of Analysis

  • Importing the data
  • Cleaning data, checking for missing variables
  • Exploratory Data Analysis
  • Analysis
  • Model Evaluation
  • Findings
  • Final Thoughts

Viewing the Fingertips Data

# Viewing the indicator ID and area type ID for all the fingertips data
indicators <- indicators()
areatypes <- area_types()

All of the indicators can be seen using (View(indicators)), it is then easy to search for any specific indicator

# Use this function to view the indicator name for any indicator ID
ind <- function(a) {
  g <- as_tibble(indicator_metadata(a)[, 2]) %>%
    filter(row_number() == 1) %>%
    pull(Indicator)
  cat(g)
}
# Use this function to view the indicator definition for any indicator ID
def <- function(a) {
  g <- as_tibble(indicator_metadata(a)[, 3]) %>%
    filter(row_number() == 1) %>%
    pull(Definition)
  cat(g)
}
#' Use this function to extract the Value and Time period data for a specific indicator ID and area code.

#' @param a Indicator ID number
#' @param b Area Type ID number

#' @return function returns a dataframe containing the Area Name, Indicator Value and Time Period

datacollection <- function(a, b) {
  e <- fingertips_data(IndicatorID = a, AreaTypeID = b)
  
  e <- e %>%
    filter(AreaType == "County & UA") %>%                   # selects all data with County and UA as its area
    select(AreaName, Value, Timeperiod, IndicatorName)      # selects only the columns of interest
  
  colnames(e)[colnames(e) == "Value"] <- e[4, 4]            # renames the column name value with the Indicator Name
  e <- e[-4]                                                # removes the indicator name column
  
  # simplifying the names
  
  names(e) <- names(e) %>% 
              {gsub("\\s*\\([^\\)]+\\)", "", .) } %>%      # removes any text in brackets
              {gsub(", ", "", .) } %>%                     # removes commas
              {gsub("- ", "", .) } %>%                     # removes -
              {gsub("\\s+$", "", .) } %>%                  # removes spaces at the end of name
              {gsub(" ", "_", .) } %>%                     # replaces all spaces with underscores
              {gsub(":", "", .) } %>%                      # removes colons
              {tolower(.) }                                # makes names lowercase
  e
}
# Use this function to select a specific time frame
dateselector <- function(a, b) {    # takes data and a date
  d <- a %>%
    filter(timeperiod == b)         # filters the data to only show the required time frame
  d <- d[-3]                        # removes the TimePeriod column
}

Loading the Data

Prevalence of Obesity

These are the variables I would like to model:

Reception: Prevalence of obesity (including severe obesity) measured by the BMI greater than or equal to the 95th centile of the UK90 growth reference among children in Reception (age 4-5 years)

# Reception: Prevalence of Obesity
# (Indicator ID:90319)
Reception_Obesity <- datacollection(90319, 102)                         # selecting the data
Reception_Obesity[2] <- round(Reception_Obesity[2], 10)                 # rounding data
Reception_Obesity <- dateselector(Reception_Obesity, "2010/11")         # selecting only 2010/11 values

Year 6: Prevalence of obesity (including severe obesity) measured by BMI greater than or equal to the 95th centile of the UK90 growth reference among children in Year 6 (age 10-11 years)

# Year 6 Prevalence of Obesity
# (Indicator ID:90323)
Y6_Obesity <- datacollection(90323, 102)                # selecting the data
Y6_Obesity[2] <- round(Y6_Obesity[2], 10)               # rounding data
Y6_Obesity <- dateselector(Y6_Obesity, "2017/18")       # selecting only 2017/18 values

Possible Explanatory Variables

These are the variables I would like to use to predict childhood obesity:

School Readiness: the percentage of children achieving a good level of development at the end of reception.
Children are defined as having reached a good level of development if they achieve at least the expected level in the early learning goals in the prime areas of learning and the early learning goals in the specific areas of mathematics and literacy.

  • Using 2012/13 data as this is the earliest available (closest to 2010/11)
# School Readiness
# (Indicator ID:90631)
School_Readiness <- datacollectionsex(90631, 102)               # selecting the data
School_Readiness <- dateselector(School_Readiness, "2012/13")   # selecting only 2012/13 values

Children with one or more decayed, missing or filled teeth at age 3

  • There is another indicator of decayed, missing or filled teeth at age 5 however this had more missing values so I chose to use the age 3 data.
  • Using 2012/13 data as this is the only time period available
# Tooth Decay Data
# (Indicator ID:92501)  
Tooth_Decay <- datacollection(92501, 102)             # selecting the data
Tooth_Decay <- dateselector(Tooth_Decay, "2012/13")   # selecting only 2012/13 values
# Imputting missing values manually using corresponding values for the Area
Tooth_Decay <- Tooth_Decay %>%
  # London
  mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Greenwich", 0.42371163)) %>%
  mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Bexley", 0.42371163)) %>%
  mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Waltham Forest", 0.42371163)) %>%
  mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Islington", 0.42371163)) %>%
  # South East
  mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "East Sussex", 0.27081672)) %>%
  # East Midlands
  mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Lincolnshire", 0.43444082)) %>%
  # South West
  mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Bath and North East Somerset", 0.30771022))

Deprivation score (IMD 2015) (measures of multiple deprivation at the small area level)

  • Using 2015 data as this is the only time period available
# Deprivation Score
# (Indicator ID:91872)  
Deprivation_Score <- datacollection(91872, 102)               # selecting the data
Deprivation_Score <- dateselector(Deprivation_Score, "2015")  # selecting 2015 values

Persistent absentees - Primary school Percentage of primary school enrolments classed as persistent absentees (defined as missing 10% or more of possible sessions).

  • Using 2014/15 data as this is the earliest available and close to the middle of 2010/11 and 2017/18
# Persistent Absentees
# (Indicator ID:92563)
Persistent_Absentees <- datacollection(92563, 102)                      # selecting the data
Persistent_Absentees <- dateselector(Persistent_Absentees, "2014/15")   # selecting 2014/15 values

Children in low income families aged 5 to 10 - The number of children aged between 5 and 10 in a family which is in receipt of Working Tax Credits, Child Tax Credits, Income Support, or Jobseekers Allowance.

  • 2013 data, close to the middle of 2010/11 and 2017/18 and latest time available
# Low Income Familes
# (Indicator ID:92479)
Low_Income_Families <- datacollectionage(92479, 102, "5-10 yrs")        # selecting the data
Low_Income_Families <- dateselector(Low_Income_Families, "2013")        # selecting 2013 values

Admissions for diabetes for children and young people aged under 19 years

  • A better indicator could be Admissions for diabetes for children aged 0 - 5 however this had a lot of missing values.
  • Using 2016/17 data used as this is the most recent
# Diabetes
# (Indicator ID:92622)
Diabetes <- datacollection(92622, 102)            # selecting the data
Diabetes <- dateselector(Diabetes, "2016/17")     # selecting 2016/17 values

# Replacing missing value for Derby with 2015/16 value
Diabetes <- Diabetes %>%
  mutate(admissions_for_diabetes_for_children_and_young_people_aged_under_19_years = replace(admissions_for_diabetes_for_children_and_young_people_aged_under_19_years, areaname == "Derby", 46.80666))

School pupils with social, emotional and mental health needs: % of school pupils with social, emotional and mental health needs (Primary school age)

  • 2018 data used as it is the latest value available
# Social Health Needs
# (Indicator ID:91871)  
Social_Health_Needs <- datacollectionage(91871, 102, "Primary school age")    # selecting the data
Social_Health_Needs <- dateselector(Social_Health_Needs, "2018")              # selecting 2018 values

Free school meals: % uptake among all pupils (Primary school age)

  • Using 2013 data
# Free school meals
# (Indicator ID: 90922) 
FreeSchoolMeals <- datacollectionage(90922, 102, "Primary school age")      # selecting the data
FreeSchoolMeals <- dateselector(FreeSchoolMeals, "2013")                    # selecting 2013 values

Obesity: QOF prevalence (18+): Patients aged 18 and over with a BMI of 30 or above

  • 2017/18 data used as it is the only time period available
# Obesity 18+
#(Indicator ID: 92588) 
Obesity18plus <- datacollection(92588, 102)                   # selecting the data
Obesity18plus <- dateselector(Obesity18plus, "2017/18")       # selecting 2017/18 values

Gender Pay equality - Gross median hourly pay, excluding overtime, for women

  • 2015 data used as it is the only time period available
# Gender Pay Equality
# (Indicator ID: 92817) 
GenderPayEquality <- datacollection(92817, 102)     # selecting the data
GenderPayEquality <- GenderPayEquality[-3]          # removing time period column as only 1 timeperiod available

# Replace missing value for Kensington and Chelsea with average of Hammersmith and Fulham and Westminster
# calculating the average (89.37865+74.79319)/2 = 82.08592
GenderPayEquality <- GenderPayEquality %>%
  mutate(gender_pay_equality = replace(gender_pay_equality, areaname == "Kensington and Chelsea", 82.08592))

Percentage of physically active adults

  • 2015/16 data used as it is the earliest time period available
# Physically active adults
# (Indicator ID: 93014)
ActiveAdults <- datacollection(93014, 102)              # selecting the data
ActiveAdults <- dateselector(ActiveAdults, "2015/16")   # selecting 2015/16 time period

Breastfeeding initiation - Measures the percentage of mothers who give their babies breast milk in the first 48 hours after delivery

  • Used 2010/11 (earliest on fingertips) data to be as close to the value when the pupils were born.
  • Filled missing values with 2011/12 data for the missing values (and 2012/13 for West Sussex where the previous 2 years were unavailable)
# Breastfeeding Initiation
#(Indicator ID: 20201)  
Breastfeeding <- datacollection(20201, 102)

# selecting 2010/11, 2011/12 and 2012/13 Data
Breastfeeding1011 <- dateselector(Breastfeeding, "2010/11")
Breastfeeding1112 <- dateselector(Breastfeeding, "2011/12")
Breastfeeding1213 <- dateselector(Breastfeeding, "2012/13")

# merging these together into 1 data frame
Breastfeeding2 <- list(Breastfeeding1011, Breastfeeding1112, Breastfeeding1213) %>%
  reduce(left_join, by = "areaname")

# replacing all NA values for 2010/11 with the value for 2011/12
Breastfeeding2 <- within(Breastfeeding2, breastfeeding_initiation.x <- ifelse(is.na(breastfeeding_initiation.x), breastfeeding_initiation.y, breastfeeding_initiation.x))

# replacting NA value for 2010/11 that was not replaced by 2011/12 data with the 2012/13 data
Breastfeeding2 <- within(Breastfeeding2, breastfeeding_initiation.x <- ifelse(is.na(breastfeeding_initiation.x), breastfeeding_initiation, breastfeeding_initiation.x))

# selcting the newly filled 2010/11 column
Breastfeeding <- select(Breastfeeding2, areaname, breastfeeding_initiation.x)

# removing .x from column name
names(Breastfeeding) <- gsub(".x", "", names(Breastfeeding))

Population vaccination coverage - MMR for one dose (5 years old)

  • 2010/11 data used as this is when the children were 5 years old
# MMR Vaccination coverage
# (Indicator ID: 30310)  
Vaccination <- datacollection(30310, 102)               # Selecting the data
Vaccination <- dateselector(Vaccination, "2010/11")     # Selecting 2010/11 values

Infant mortality rate

  • Child mortality rate might have been a better indicator however this had many missing values
  • Used infant mortality rate instead however the data is only for 2007/09
# Infant Mortality Rate
#(Indicator ID: 92196)
Mortality <- datacollection(92196, 102)               # Selecting the data
Mortality <- dateselector(Mortality, "2007 - 09")     # Selecting the time period

The percentage of the total resident population who are 0-15 years of age

  • Used 2017 time period as it is the only one available
# Resident Population Aged 0 - 15
# (Indicator ID: 93084)  
residentpopulation <- datacollection(93084, 102)      # selecting the data
residentpopulation <- residentpopulation[-3]          # removing time period column as only 2017 values available

Access to woodland: Percentage of the population in each local authority that has accessible woodland of at least 2 hectare within 500 metres of where they live

  • 2015 data used as it is the only data available
  • Missing Values of Barking & Dagenham and Islington - Take an average of the surrounding boroughs.
# Woodland Access
#(Indicator ID: 92814) 
Woodland <- datacollection(92814, 102)    # collecting the data
Woodland <- Woodland[-3]                  # removing time period column

# Barking & Dagenham = Average of Bexley, Greenwich, Havering, Newham and Redbridge
# (4.57+25.78+14.74+11.97+24.41)/5 = 16.294
# Islington = Average of Camden, Hackney and Haringey
# (5.95+9.87+8.66)/3 = 8.16
Woodland <- Woodland %>%
  mutate(access_to_woodland = replace(access_to_woodland, areaname == "Barking and Dagenham", 16.294)) %>%
  mutate(access_to_woodland = replace(access_to_woodland, areaname == "Islington", 8.16))

Admissions for asthma for children aged 0 to 9

  • Using 2016/17 data and 2015/16 data for the missing values
# Asthma Admissions
# (Indicator ID: 92481) 
Asthma <- datacollection(92481, 102)          # collecting the data
Asthma1617 <- dateselector(Asthma, "2016/17") # selecting 2016/17 data
Asthma1516 <- dateselector(Asthma, "2015/16") # selecting 2015/16 data

Asthma2 <- list(Asthma1617, Asthma1516) %>%   # joining together both time values
  reduce(left_join, by = "areaname")

# replacing na in 2016/17 data with 2015/16 data
Asthma2 <- within(Asthma2, admissions_for_asthma_for_children_aged_0_to_9.x <- ifelse(is.na(admissions_for_asthma_for_children_aged_0_to_9.x), admissions_for_asthma_for_children_aged_0_to_9.y, admissions_for_asthma_for_children_aged_0_to_9.x))

# selecting filled column
Asthma <- select(Asthma2, areaname, admissions_for_asthma_for_children_aged_0_to_9.x) 

names(Asthma) <- gsub(".x", "", names(Asthma)) # removing .x from column name

Admissions for gastroenteritis in infants aged 2, 3 and 4 years

  • Using 2016/17 data and 2015/16 data for the missing values
# Gastroenteritis Admissions
# (Indicator ID: 92248)
gastroenteritis <- datacollection(92248, 102)                     # collecting the data
gastroenteritis1617 <- dateselector(gastroenteritis, "2016/17")   # selecting 2016/17 values
gastroenteritis1516 <- dateselector(gastroenteritis, "2015/16")   # selecting 2015/16 values

gastroenteritis2 <- list(gastroenteritis1617, gastroenteritis1516) %>%    # joining together both time values
  reduce(left_join, by = "areaname")

# replacing na in 2016/17 data with 2015/16 data
gastroenteritis2 <- within(gastroenteritis2, admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x <- ifelse(is.na(admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x), admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.y, admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x))

 # selecting filled column
gastroenteritis <- select(gastroenteritis2, areaname, admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x) 

names(gastroenteritis) <- gsub("_in_infants_aged_23_and_4_years.x", "", names(gastroenteritis)) # renaming column

Admissions for respiratory tract infections in infants aged under 1 year

  • 2016/17 data used
  • Nottingham and Nottinghamshire missing due to errors with HES coding - Using 2015/16 data to fill these values
# Admissions for Respiratory Tract Infections
# (Indicator ID: 92253)
respiratory <- datacollection(92251, 102)             # selecting the data
respiratory <- dateselector(respiratory, "2016/17")   # selecting 2016/17 time period

# Missing Data for Nottingham and Nottinghamshire

# Nottingham
respiratory <- respiratory %>%
  mutate(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year = replace(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year, areaname == "Nottingham", 657.6448)) %>%
  
# Nottinghamshire
  mutate(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year = replace(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year, areaname == "Nottinghamshire", 642.7746))

Parent Name

# Selecting the Parent Name
Parent_Name <- fingertips_data(IndicatorID = 92814) %>%
  filter(AreaType == "County & UA") %>%
  select(AreaName, ParentName)
names(Parent_Name) <- tolower(names(Parent_Name))

Merging the Data

# Merging the Data
EDA_Data <-
  list(
    Parent_Name,
    Reception_Obesity,
    Y6_Obesity,
    Deprivation_Score,
    Persistent_Absentees,
    Low_Income_Families,
    School_Readiness,
    Social_Health_Needs,
    Diabetes,
    Tooth_Decay,
    Obesity18plus,
    ActiveAdults,
    Asthma,
    Breastfeeding,
    gastroenteritis,
    Vaccination,
    Mortality,
    respiratory,
    Woodland,
    residentpopulation,
    GenderPayEquality
  ) %>%
  purrr::reduce(left_join, by = "areaname")

Adding the London Indicator

# London Indicator
EDA_Data <- mutate(EDA_Data, london = ifelse(EDA_Data$parentname == "London region", 1, 0))

# List of all Inner London Boroughs
central <- c("Camden", "Greenwich", "Hackney", "Hammersmith and Fulham", "Islington", "Kensington and Chelsea", "Lambeth", "Lewisham", "Southwark", "Tower Hamlets", "Wandsworth", "Westminster" )

# Central London Indicator
EDA_Data <- mutate(EDA_Data, inner_london = ifelse(is.element(EDA_Data$areaname, central), 1, 0))

Dealing with final missing values

# Replacing Missing value for Buckinghamshire prevalence of Obesity (18+) with South East Value of 8.313363
EDA_Data <- EDA_Data %>%
  mutate(obesity_qof_prevalence = replace(obesity_qof_prevalence, areaname == "Buckinghamshire", 8.313363))
# Removing City of London, Isles of Scilly and Torbay as they do not contain values for Childhood Obesity levels at age year 6 or reception.
EDA_Data <- subset(EDA_Data, areaname != "City of London")
EDA_Data <- subset(EDA_Data, areaname != "Isles of Scilly")
EDA_Data <- subset(EDA_Data, areaname != "Torbay")

# Removing Rutland as more than half of the data values are missing
EDA_Data <- subset(EDA_Data, areaname != "Rutland")
# Simplifying Wording for parentname
EDA_Data$parentname <- removeWords(EDA_Data$parentname, "region")
EDA_Data$parentname <- removeWords(EDA_Data$parentname, " and the Humber")
EDA_Data$parentname <- as.factor(EDA_Data$parentname)

Exploratory Data Analysis

Exploring the relationship between the Childhood Obesity in Reception and Year 6

# plotting reception against year 6 prevalence of obesty
plot1 <- ggplot(EDA_Data, aes(x = reception_prevalence_of_obesity, y = year_6_prevalence_of_obesity, text = areaname)) +
  geom_point() +
  labs(title = "Prevalence of Obesity at Reception (Age 4/5) and Year 6 (Age 10/11)") +
  xlab("Reception: Prevalence of Obesity 2010/11") +
  ylab("Year 6: Prevalence of Obesity 2017/18") + 
  theme(plot.title = element_text(hjust = 0.5))

ggplotly(plot1)
# Plotting England Year 6 Prevalence of Obesity
ggplot(EDA_Data, aes(x = "", y = year_6_prevalence_of_obesity)) +
  stat_boxplot(geom = "errorbar", width = 0.1) +
  geom_boxplot(width = 0.4, outlier.color = "red", outlier.shape = 1) +
  geom_jitter(width = 0.15, size = 1) +
  labs(title = "National Spread of Prevalence of Obesity (Age 10/11)") + 
  xlab("England") +
  ylab("Year 6 Prevalence of Obesity") +
  theme(plot.title = element_text(hjust = 0.5)) + 
  coord_flip()

# Plotting regional variation in Year 6 Prevalence of Obesity
ggplot(EDA_Data, aes(x = parentname, y = year_6_prevalence_of_obesity)) +
  stat_boxplot(geom = "errorbar") +
  geom_boxplot(outlier.color = "red", outlier.shape = 1) +
  labs(title = "Regional Spread of Prevalence of Obesity (Age 10/11)") +
  xlab("Region") +
  ylab("Year 6 Prevalence of Obesity") +
  theme(plot.title = element_text(hjust = 0.5)) + 
  coord_flip()

  • Clear relationship between childhood obesity in Reception and Year 6 pupils however there is some regional difference
  • The lowest 3 London values of Year 6 prevalence of obesity are outliers within the London region, however they are not outliers at a national level

Exploring the relationships between the predictive variables

Correlation Plot between the variables

# Calculating Correlation Matrix
correlation <- cor((select(EDA_Data, -"areaname", -"parentname", -"reception_prevalence_of_obesity", -"year_6_prevalence_of_obesity", -"london", -"inner_london")), use = "complete.obs")

# Using Custom Labels as the variable names were too long

colnames(correlation) <- c("Deprivation Score", "Persistent Abesentees", "Low Income Families", "School Readiness", "Social Health Needs", "Diabetes Admissions (0-9", "Decayed/Missing/Filled Teeth", "Obesity in 18+", "Physically Active Adults", "Asthma Admissions (0-9)", "Breastfeeding Initiation", "Gastroenteritis (2-4)", "MMR Vaccination (5)", "Child Mortality Rate", "Respiratory Infections (2-4)", "Woodland Access", "% Population Age 0-15", "Gender Pay Equity")

rownames(correlation) <- c("Deprivation Score", "Persistent Abesentees", "Low Income Families", "School Readiness", "Social Health Needs", "Diabetes Admissions (0-9", "Decayed/Missing/Filled Teeth", "Obesity in 18+", "Physically Active Adults", "Asthma Admissions (0-9)", "Breastfeeding Initiation", "Gastroenteritis (2-4)", "MMR Vaccination (5)", "Child Mortality Rate", "Respiratory Infections (2-4)", "Woodland Access", "% Population Age 0-15", "Gender Pay Equity")

# Plotting Correlation
corrplot(correlation, method = "circle", type = "upper", tl.cex = 0.75, tl.col = "black")

  • This is a good visual aid for seeing correlation between the predictive variables however the table below is more useful to see the individual correlation relationships

Finding highest correlations between variables

# Shows biggest correlations
correlation[upper.tri(correlation, diag = TRUE)] <- NA # sets upper half of correlation matrix to NA to avoid duplicates
m <- melt(correlation) # collapsing the dataframe
m <- m[order(-abs(m$value)), ] # ordering by correlation value (largest to smallest)
m <- na.omit(m) # removing NAs
head(m) # shows the top of the dataframe
##                             Var1                     Var2      value
## 3            Low Income Families        Deprivation Score  0.9540741
## 137     Breastfeeding Initiation           Obesity in 18+ -0.7962584
## 213 Respiratory Infections (2-4)    Gastroenteritis (2-4)  0.7850130
## 21           Low Income Families    Persistent Abesentees  0.7413458
## 2          Persistent Abesentees        Deprivation Score  0.7262395
## 192        Gastroenteritis (2-4) Breastfeeding Initiation -0.5902976

This table shows the 6 largest correlation relationships

  1. Low Income Families and Deprivation Score
  • Correlation value of 0.954
  • This is not surprising if we look at how the two variables are calculated:
    • low income families is the number of children aged between 5 and 10 in a family which is in receipt of Working Tax Credits, Child Tax Credits, Income Support, or Jobseekers Allowance.
    • IMD is comprised of Income Deprivation; Employment Deprivation; Education, Skills and Training Deprivation; Health Deprivation and Disability; Crime; Barriers to Housing and Service; and Living Environmental Deprivation.

While low income families is more closely related to children in an area, I chose to remove this predictive variable in favour of Deprivation score as this encompassed more measures of deprivation in more detail than the low income families indicator.

  1. Breastfeeding Initiation and Adult Obesity
# plotting breastfeeding initiation against adult obesity
ggplot(EDA_Data, aes(x = breastfeeding_initiation, 
                     y = obesity_qof_prevalence)) +     
  geom_point() +                                        # adding points
  xlab("Breastfeeding Initiation") +                    # adding x axis label
  ylab("Obesity 18+") +                                 # adding y axis label
  stat_smooth(method = "lm", col = "darkblue", se = F)  # adding linear model line

  • Correlation value of -0.796
  • There is a clear relationship between breastfeeding and prevalence of adult obesity in a local area.
  • “Observational studies have shown that breastfeeding is associated with lower levels of child obesity.” - As this relationship is known, I think it is important to keep both variables in my analysis, despite their high correlation value.
  • There is also evidence that babies who are breast fed experience lower levels of gastro-intestinal and respiratory infection which could explain the high correlation value between these variables

Breastfeeding Initiation and Population MMR Vaccination Coverage

# Plotting breastfeeding initiation against MMR vaccination coverage
ggplot(EDA_Data, aes(x = (breastfeeding_initiation),                             # plotting breastfeeding initiation on x axis 
                     y = population_vaccination_coverage_mmr_for_one_dose)) +    # plotting MMR Vaccination on y axis
  geom_point(aes(color = parentname)) +                                          # colouring the points by region
  labs(color = "Region") +                                                       # adding the ledgend 
  xlab("Breastfeeding Initiation") +                                             # x axis label
  ylab("MMR Vaccination Population Coverage")                                    # y axis label

  • Correlation value of -0.543
  • Surprising correlation between these two variables, wouldn’t necessarily expect a correlation
  • The London Variation in MMR Population Coverage supports the addition of a London Indicator
# plotting regional spread of MMR Vaccination coverage
ggplot(EDA_Data, aes(x = parentname, y = population_vaccination_coverage_mmr_for_one_dose)) + # plotting Vaccination rates against Region
  stat_boxplot(geom = "errorbar") +                                                           # adding end bars
  geom_boxplot(outlier.color = "red", outlier.shape = 1) +                                    # colouring outliers red circles
  xlab("Region") +                                                                            # X axis label
  ylab("MMR Vaccination") +                                                                   # Y axis label
  coord_flip()                                                                                # flipping the axis

  • The lack of variation in values for MMR Vaccination coverage excluding London suggests removing this variable

Analysis plan

Revisiting the aims:
Identify the most influencing factors for childhood obesity for children in in Year 6 (age 10/11 in 2017/18)
How do these vary by local area?*

Do this by creating an explanatory linear model to determine the most important predictors for modelling Year 6 obesity

Determine if different factors effect the prevalence of obesity at Reception for the same cohort of children (age 5/6 in 2010/11) within the same local area
Are these different and does this explain the increase in childhood obesity from Reception to Year 6?*

Do this by creating another explanatory linear model for reception in 2010/11 and comparing the important predictors with the previous model

Modelling

Year 6 Obesity

#Selecting the variables to model (ignoring Region and variables removed earlier in EDA)
model_data <- select(EDA_Data, -parentname, -reception_prevalence_of_obesity, -population_vaccination_coverage_mmr_for_one_dose, -children_in_low_income_families_aged_5_to_10, -london) # selecting data to model
# Inital Linear Model including all the quantitative predictors except MMR Vaccination coverage, Reception obesity levels and low income families
Model1 <- lm(year_6_prevalence_of_obesity ~ . - areaname, data = model_data)

VIF (Variance Inflation Factor) Test for multicolinearity

  • Values of 1 means that the predictor is not correlated with the other k predictors
  • Values exceeding 4 warrants further investigation
  • Values exceeding 10 are signs of serious multicolinearity
# VIF of Model 1 - Check for multicolinearity
car::vif(Model1)
##                                                                                                            deprivation_score 
##                                                                                                                     5.367664 
##                                                                                          persistent_absentees_primary_school 
##                                                                                                                     2.686040 
##                    school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception 
##                                                                                                                     1.283347 
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs` 
##                                                                                                                     1.311034 
##                                                    admissions_for_diabetes_for_children_and_young_people_aged_under_19_years 
##                                                                                                                     1.427380 
##                                                                                                      dmft_in_three_year_olds 
##                                                                                                                     1.844987 
##                                                                                                       obesity_qof_prevalence 
##                                                                                                                     3.592104 
##                                                                                       percentage_of_physically_active_adults 
##                                                                                                                     2.406471 
##                                                                               admissions_for_asthma_for_children_aged_0_to_9 
##                                                                                                                     2.166784 
##                                                                                                     breastfeeding_initiation 
##                                                                                                                     4.758843 
##                                                                                               admissions_for_gastroenteritis 
##                                                                                                                     3.621918 
##                                                                                                             infant_mortality 
##                                                                                                                     1.967503 
##                                                     admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year 
##                                                                                                                     3.203037 
##                                                                                                           access_to_woodland 
##                                                                                                                     1.278087 
##                                                      `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 
##                                                                                                                     1.744322 
##                                                                                                          gender_pay_equality 
##                                                                                                                     1.610330 
##                                                                                                                 inner_london 
##                                                                                                                     1.995328

Model Selection Methods

Stepwise

Forwards and Backwards Stepwise Regression using AIC

  • Forwards Stepwise Regression starts with the null model and adds one variable at a time, measuring fit using AIC
  • Backwards Stepwise Regression starts with the full model and removes one variable at a time, measuring fit using AIC
  • AIC (Akaike information criterion) is a measure of fit of the model which estimates the relative amount of information lost while penalising the complexity of the model
  • The function stepAIC (from the package MASS) calculates the AIC values from performing backwards and forwards stepwise regression and chooses the model with the lowest AIC value
# Stepwise model selection using AIC
MASS::stepAIC(Model1, direction = "both", trace = FALSE)
## 
## Call:
## lm(formula = year_6_prevalence_of_obesity ~ deprivation_score + 
##     obesity_qof_prevalence + percentage_of_physically_active_adults + 
##     admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year + 
##     `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + 
##     inner_london, data = model_data)
## 
## Coefficients:
##                                                              (Intercept)  
##                                                                17.717987  
##                                                        deprivation_score  
##                                                                 0.247947  
##                                                   obesity_qof_prevalence  
##                                                                 0.258372  
##                                   percentage_of_physically_active_adults  
##                                                                -0.185006  
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year  
##                                                                -0.002349  
##  `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`  
##                                                                 0.410762  
##                                                             inner_london  
##                                                                 3.173306
  • This model has 6 predictors:
    • Deprivation score
    • Obesity at 18+
    • % of physically active adults
    • Admissions for respiratory tract infections
    • % of population aged 0-15
    • Inner London Indicator

Exhaustive Search using regsubsets for the best model with 1-16 predictors

  • Regsubsets is another way of selecting predictors.
  • It uses an exhaustive search method which means that it calculates the best model for each number of predictors by testing all of the possible combinations of the predictors
  • This is measured by BIC which works in the same way as AIC however penalises complex models more than AIC does
# performring model selection using regsubsets and an exhaustive search 
# This finds the best model for each number of predictors between 1 and 17 (full model)
aoutput <- (regsubsets(year_6_prevalence_of_obesity ~ . - areaname, data = model_data, nvmax = 17, method = "exhaustive"))
a <- summary(aoutput)

Plotting BIC for each model

  • Models can be selected by specifying the number of predictors, or selecting the model with the number of predictors that gives the lowest value of BIC
# BIC plot for models selected using regsubsets
plot(a$bic)

Selecting the 6 variable model using regsubsets output
- The 6 variable model gave the lowest BIC value - The predictive variables chosen in the 6 variable model can be found by selecting the variables with a * in row 6

# summary of output produced by regsubsets
summary(aoutput)
## Subset selection object
## Call: regsubsets.formula(year_6_prevalence_of_obesity ~ . - areaname, 
##     data = model_data, nvmax = 17, method = "exhaustive")
## 17 Variables  (and intercept)
##                                                                                                                              Forced in
## deprivation_score                                                                                                                FALSE
## persistent_absentees_primary_school                                                                                              FALSE
## school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception                        FALSE
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs`     FALSE
## admissions_for_diabetes_for_children_and_young_people_aged_under_19_years                                                        FALSE
## dmft_in_three_year_olds                                                                                                          FALSE
## obesity_qof_prevalence                                                                                                           FALSE
## percentage_of_physically_active_adults                                                                                           FALSE
## admissions_for_asthma_for_children_aged_0_to_9                                                                                   FALSE
## breastfeeding_initiation                                                                                                         FALSE
## admissions_for_gastroenteritis                                                                                                   FALSE
## infant_mortality                                                                                                                 FALSE
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year                                                         FALSE
## access_to_woodland                                                                                                               FALSE
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`                                                          FALSE
## gender_pay_equality                                                                                                              FALSE
## inner_london                                                                                                                     FALSE
##                                                                                                                              Forced out
## deprivation_score                                                                                                                 FALSE
## persistent_absentees_primary_school                                                                                               FALSE
## school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception                         FALSE
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs`      FALSE
## admissions_for_diabetes_for_children_and_young_people_aged_under_19_years                                                         FALSE
## dmft_in_three_year_olds                                                                                                           FALSE
## obesity_qof_prevalence                                                                                                            FALSE
## percentage_of_physically_active_adults                                                                                            FALSE
## admissions_for_asthma_for_children_aged_0_to_9                                                                                    FALSE
## breastfeeding_initiation                                                                                                          FALSE
## admissions_for_gastroenteritis                                                                                                    FALSE
## infant_mortality                                                                                                                  FALSE
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year                                                          FALSE
## access_to_woodland                                                                                                                FALSE
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`                                                           FALSE
## gender_pay_equality                                                                                                               FALSE
## inner_london                                                                                                                      FALSE
## 1 subsets of each size up to 17
## Selection Algorithm: exhaustive
##           deprivation_score persistent_absentees_primary_school
## 1  ( 1 )  "*"               " "                                
## 2  ( 1 )  "*"               " "                                
## 3  ( 1 )  "*"               " "                                
## 4  ( 1 )  "*"               " "                                
## 5  ( 1 )  "*"               " "                                
## 6  ( 1 )  "*"               " "                                
## 7  ( 1 )  "*"               " "                                
## 8  ( 1 )  "*"               " "                                
## 9  ( 1 )  "*"               " "                                
## 10  ( 1 ) "*"               " "                                
## 11  ( 1 ) "*"               " "                                
## 12  ( 1 ) "*"               "*"                                
## 13  ( 1 ) "*"               "*"                                
## 14  ( 1 ) "*"               "*"                                
## 15  ( 1 ) "*"               "*"                                
## 16  ( 1 ) "*"               "*"                                
## 17  ( 1 ) "*"               "*"                                
##           school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception
## 1  ( 1 )  " "                                                                                                      
## 2  ( 1 )  " "                                                                                                      
## 3  ( 1 )  " "                                                                                                      
## 4  ( 1 )  " "                                                                                                      
## 5  ( 1 )  " "                                                                                                      
## 6  ( 1 )  " "                                                                                                      
## 7  ( 1 )  " "                                                                                                      
## 8  ( 1 )  " "                                                                                                      
## 9  ( 1 )  " "                                                                                                      
## 10  ( 1 ) " "                                                                                                      
## 11  ( 1 ) " "                                                                                                      
## 12  ( 1 ) " "                                                                                                      
## 13  ( 1 ) " "                                                                                                      
## 14  ( 1 ) " "                                                                                                      
## 15  ( 1 ) " "                                                                                                      
## 16  ( 1 ) " "                                                                                                      
## 17  ( 1 ) "*"                                                                                                      
##           `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs`
## 1  ( 1 )  " "                                                                                                                         
## 2  ( 1 )  " "                                                                                                                         
## 3  ( 1 )  " "                                                                                                                         
## 4  ( 1 )  " "                                                                                                                         
## 5  ( 1 )  " "                                                                                                                         
## 6  ( 1 )  " "                                                                                                                         
## 7  ( 1 )  " "                                                                                                                         
## 8  ( 1 )  " "                                                                                                                         
## 9  ( 1 )  " "                                                                                                                         
## 10  ( 1 ) "*"                                                                                                                         
## 11  ( 1 ) "*"                                                                                                                         
## 12  ( 1 ) "*"                                                                                                                         
## 13  ( 1 ) "*"                                                                                                                         
## 14  ( 1 ) "*"                                                                                                                         
## 15  ( 1 ) "*"                                                                                                                         
## 16  ( 1 ) "*"                                                                                                                         
## 17  ( 1 ) "*"                                                                                                                         
##           admissions_for_diabetes_for_children_and_young_people_aged_under_19_years
## 1  ( 1 )  " "                                                                      
## 2  ( 1 )  " "                                                                      
## 3  ( 1 )  " "                                                                      
## 4  ( 1 )  " "                                                                      
## 5  ( 1 )  " "                                                                      
## 6  ( 1 )  " "                                                                      
## 7  ( 1 )  " "                                                                      
## 8  ( 1 )  "*"                                                                      
## 9  ( 1 )  "*"                                                                      
## 10  ( 1 ) "*"                                                                      
## 11  ( 1 ) "*"                                                                      
## 12  ( 1 ) "*"                                                                      
## 13  ( 1 ) "*"                                                                      
## 14  ( 1 ) "*"                                                                      
## 15  ( 1 ) "*"                                                                      
## 16  ( 1 ) "*"                                                                      
## 17  ( 1 ) "*"                                                                      
##           dmft_in_three_year_olds obesity_qof_prevalence
## 1  ( 1 )  " "                     " "                   
## 2  ( 1 )  " "                     " "                   
## 3  ( 1 )  " "                     " "                   
## 4  ( 1 )  " "                     " "                   
## 5  ( 1 )  " "                     " "                   
## 6  ( 1 )  " "                     "*"                   
## 7  ( 1 )  " "                     "*"                   
## 8  ( 1 )  " "                     "*"                   
## 9  ( 1 )  " "                     "*"                   
## 10  ( 1 ) " "                     "*"                   
## 11  ( 1 ) "*"                     "*"                   
## 12  ( 1 ) "*"                     "*"                   
## 13  ( 1 ) "*"                     "*"                   
## 14  ( 1 ) "*"                     "*"                   
## 15  ( 1 ) "*"                     "*"                   
## 16  ( 1 ) "*"                     "*"                   
## 17  ( 1 ) "*"                     "*"                   
##           percentage_of_physically_active_adults
## 1  ( 1 )  " "                                   
## 2  ( 1 )  " "                                   
## 3  ( 1 )  "*"                                   
## 4  ( 1 )  "*"                                   
## 5  ( 1 )  "*"                                   
## 6  ( 1 )  "*"                                   
## 7  ( 1 )  "*"                                   
## 8  ( 1 )  "*"                                   
## 9  ( 1 )  "*"                                   
## 10  ( 1 ) "*"                                   
## 11  ( 1 ) "*"                                   
## 12  ( 1 ) "*"                                   
## 13  ( 1 ) "*"                                   
## 14  ( 1 ) "*"                                   
## 15  ( 1 ) "*"                                   
## 16  ( 1 ) "*"                                   
## 17  ( 1 ) "*"                                   
##           admissions_for_asthma_for_children_aged_0_to_9
## 1  ( 1 )  " "                                           
## 2  ( 1 )  " "                                           
## 3  ( 1 )  " "                                           
## 4  ( 1 )  " "                                           
## 5  ( 1 )  " "                                           
## 6  ( 1 )  " "                                           
## 7  ( 1 )  " "                                           
## 8  ( 1 )  " "                                           
## 9  ( 1 )  " "                                           
## 10  ( 1 ) " "                                           
## 11  ( 1 ) " "                                           
## 12  ( 1 ) " "                                           
## 13  ( 1 ) " "                                           
## 14  ( 1 ) "*"                                           
## 15  ( 1 ) "*"                                           
## 16  ( 1 ) "*"                                           
## 17  ( 1 ) "*"                                           
##           breastfeeding_initiation admissions_for_gastroenteritis
## 1  ( 1 )  " "                      " "                           
## 2  ( 1 )  " "                      " "                           
## 3  ( 1 )  " "                      " "                           
## 4  ( 1 )  " "                      " "                           
## 5  ( 1 )  " "                      " "                           
## 6  ( 1 )  " "                      " "                           
## 7  ( 1 )  " "                      " "                           
## 8  ( 1 )  " "                      " "                           
## 9  ( 1 )  " "                      "*"                           
## 10  ( 1 ) " "                      "*"                           
## 11  ( 1 ) " "                      "*"                           
## 12  ( 1 ) " "                      "*"                           
## 13  ( 1 ) "*"                      "*"                           
## 14  ( 1 ) "*"                      "*"                           
## 15  ( 1 ) "*"                      "*"                           
## 16  ( 1 ) "*"                      "*"                           
## 17  ( 1 ) "*"                      "*"                           
##           infant_mortality
## 1  ( 1 )  " "             
## 2  ( 1 )  " "             
## 3  ( 1 )  " "             
## 4  ( 1 )  " "             
## 5  ( 1 )  " "             
## 6  ( 1 )  " "             
## 7  ( 1 )  " "             
## 8  ( 1 )  " "             
## 9  ( 1 )  " "             
## 10  ( 1 ) " "             
## 11  ( 1 ) " "             
## 12  ( 1 ) " "             
## 13  ( 1 ) " "             
## 14  ( 1 ) " "             
## 15  ( 1 ) " "             
## 16  ( 1 ) "*"             
## 17  ( 1 ) "*"             
##           admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year
## 1  ( 1 )  " "                                                                     
## 2  ( 1 )  " "                                                                     
## 3  ( 1 )  " "                                                                     
## 4  ( 1 )  " "                                                                     
## 5  ( 1 )  "*"                                                                     
## 6  ( 1 )  "*"                                                                     
## 7  ( 1 )  "*"                                                                     
## 8  ( 1 )  "*"                                                                     
## 9  ( 1 )  "*"                                                                     
## 10  ( 1 ) "*"                                                                     
## 11  ( 1 ) "*"                                                                     
## 12  ( 1 ) "*"                                                                     
## 13  ( 1 ) "*"                                                                     
## 14  ( 1 ) "*"                                                                     
## 15  ( 1 ) "*"                                                                     
## 16  ( 1 ) "*"                                                                     
## 17  ( 1 ) "*"                                                                     
##           access_to_woodland
## 1  ( 1 )  " "               
## 2  ( 1 )  " "               
## 3  ( 1 )  " "               
## 4  ( 1 )  " "               
## 5  ( 1 )  " "               
## 6  ( 1 )  " "               
## 7  ( 1 )  " "               
## 8  ( 1 )  " "               
## 9  ( 1 )  " "               
## 10  ( 1 ) " "               
## 11  ( 1 ) " "               
## 12  ( 1 ) " "               
## 13  ( 1 ) " "               
## 14  ( 1 ) " "               
## 15  ( 1 ) "*"               
## 16  ( 1 ) "*"               
## 17  ( 1 ) "*"               
##           `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`
## 1  ( 1 )  " "                                                                    
## 2  ( 1 )  "*"                                                                    
## 3  ( 1 )  " "                                                                    
## 4  ( 1 )  "*"                                                                    
## 5  ( 1 )  "*"                                                                    
## 6  ( 1 )  "*"                                                                    
## 7  ( 1 )  "*"                                                                    
## 8  ( 1 )  "*"                                                                    
## 9  ( 1 )  "*"                                                                    
## 10  ( 1 ) "*"                                                                    
## 11  ( 1 ) "*"                                                                    
## 12  ( 1 ) "*"                                                                    
## 13  ( 1 ) "*"                                                                    
## 14  ( 1 ) "*"                                                                    
## 15  ( 1 ) "*"                                                                    
## 16  ( 1 ) "*"                                                                    
## 17  ( 1 ) "*"                                                                    
##           gender_pay_equality inner_london
## 1  ( 1 )  " "                 " "         
## 2  ( 1 )  " "                 " "         
## 3  ( 1 )  " "                 "*"         
## 4  ( 1 )  " "                 "*"         
## 5  ( 1 )  " "                 "*"         
## 6  ( 1 )  " "                 "*"         
## 7  ( 1 )  "*"                 "*"         
## 8  ( 1 )  "*"                 "*"         
## 9  ( 1 )  "*"                 "*"         
## 10  ( 1 ) "*"                 "*"         
## 11  ( 1 ) "*"                 "*"         
## 12  ( 1 ) "*"                 "*"         
## 13  ( 1 ) "*"                 "*"         
## 14  ( 1 ) "*"                 "*"         
## 15  ( 1 ) "*"                 "*"         
## 16  ( 1 ) "*"                 "*"         
## 17  ( 1 ) "*"                 "*"

Fitting the chosen model

# Fitting the model chosen using regsubsets minimising BIC
modely6 <- lm(year_6_prevalence_of_obesity ~ deprivation_score + `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + inner_london + obesity_qof_prevalence + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year + percentage_of_physically_active_adults, data = model_data)
# Model Summary
summary(modely6)
## 
## Call:
## lm(formula = year_6_prevalence_of_obesity ~ deprivation_score + 
##     `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + 
##     inner_london + obesity_qof_prevalence + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year + 
##     percentage_of_physically_active_adults, data = model_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1862 -1.2957  0.0226  1.2730  4.3484 
## 
## Coefficients:
##                                                                           Estimate
## (Intercept)                                                              17.717987
## deprivation_score                                                         0.247947
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`   0.410762
## inner_london                                                              3.173306
## obesity_qof_prevalence                                                    0.258372
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year -0.002349
## percentage_of_physically_active_adults                                   -0.185006
##                                                                          Std. Error
## (Intercept)                                                                4.707109
## deprivation_score                                                          0.024989
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`    0.084174
## inner_london                                                               0.696160
## obesity_qof_prevalence                                                     0.093076
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year   0.000788
## percentage_of_physically_active_adults                                     0.050373
##                                                                          t value
## (Intercept)                                                                3.764
## deprivation_score                                                          9.922
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`    4.880
## inner_london                                                               4.558
## obesity_qof_prevalence                                                     2.776
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year  -2.981
## percentage_of_physically_active_adults                                    -3.673
##                                                                          Pr(>|t|)
## (Intercept)                                                              0.000245
## deprivation_score                                                         < 2e-16
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`  2.83e-06
## inner_london                                                             1.11e-05
## obesity_qof_prevalence                                                   0.006252
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year 0.003386
## percentage_of_physically_active_adults                                   0.000340
##                                                                             
## (Intercept)                                                              ***
## deprivation_score                                                        ***
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`  ***
## inner_london                                                             ***
## obesity_qof_prevalence                                                   ** 
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year ** 
## percentage_of_physically_active_adults                                   ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.794 on 141 degrees of freedom
## Multiple R-squared:  0.7713, Adjusted R-squared:  0.7615 
## F-statistic: 79.24 on 6 and 141 DF,  p-value: < 2.2e-16

Chosen Indicators
1. deprivation score - positive correlation
The more deprived an area is, the higher their levels of year 6 obesity
2. percentage of the total resident population aged 0-15 - positive correlation
The larger the percentage of children aged 0-15 living in a local area, the higher their levels of year 6 obesity
3. ‘London Factor’ - positive correlation
If a child is living in London, they are more likely to be obese
4. Physically active adults - negative correlation
The larger the percentage of physically active adults, the lower the levels of childhood obesity 5. Admissions for respiratory tract infections in infants under 1 year old - negative correlation
The higher the admissions for respiratory tract infections in infants in a local area, the lower the levels of childhood obesity
6. % Obesity 18+ - positive correlation
The larger the percentage of obese adults in the local area, the higher the level of year 6 obesity in the same area

Additional Indicators using AIC:

  • These are the same indicators chosen with stepwise AIC
# Model plots
(plot(modely6))

## NULL

Areas of Interest
1 area that the model is significantly overfitting (predicting higher levels of obesity than the true value)
- 72: Barnsley 2 areas the model is significantly underfitting - 82: Dudley - 116: Merton

Reception Model

# Selecting the Data
model_data2 <- select(EDA_Data, -parentname, -year_6_prevalence_of_obesity, -children_in_low_income_families_aged_5_to_10, -population_vaccination_coverage_mmr_for_one_dose, -london)
# Fitting the full model
Model3 <- lm(reception_prevalence_of_obesity ~ . - areaname, data = model_data2)

Stepwise Regression

# Model selection using forwards/backwards stepwise regression and AIC
MASS::stepAIC(Model3, direction = "both", trace = FALSE)
## 
## Call:
## lm(formula = reception_prevalence_of_obesity ~ deprivation_score + 
##     `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs` + 
##     obesity_qof_prevalence + percentage_of_physically_active_adults + 
##     breastfeeding_initiation + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year + 
##     `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + 
##     inner_london, data = model_data2)
## 
## Coefficients:
##                                                                                                                  (Intercept)  
##                                                                                                                    4.0917442  
##                                                                                                            deprivation_score  
##                                                                                                                    0.0977588  
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs`  
##                                                                                                                    0.2990894  
##                                                                                                       obesity_qof_prevalence  
##                                                                                                                    0.1940950  
##                                                                                       percentage_of_physically_active_adults  
##                                                                                                                   -0.0558367  
##                                                                                                     breastfeeding_initiation  
##                                                                                                                    0.0346537  
##                                                     admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year  
##                                                                                                                   -0.0009793  
##                                                      `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`  
##                                                                                                                    0.1190515  
##                                                                                                                 inner_london  
##                                                                                                                    1.6126184

The Stepwise Selection Model has 8 predictors:

  • Deprivation Score
  • Social Health Needs
  • Obesity 18+
  • Physically Active Adults
  • Breastfeeding at birth
  • Admissions for respiratory tract infections
  • % of population aged 0-15
  • Inner London Indicator

Regsubsets

# exhaustive search using regsubsets 
aoutput2 <- (regsubsets(reception_prevalence_of_obesity ~ . - areaname, data = model_data2, nvmax = 16, method = "exhaustive"))
# summary of the output, use this to select models
a2 <- summary(aoutput2)

Plotting BIC

# BIC Values for the best model with each number of variables
plot(a2$bic)

Chosen Variables
1. Deprivation Score - Positive relationship
2. London Factor - Positive relationship
3. % of population aged 0-15 4. Breastfeeding at Birth 5. Obesity in Adults 18+ - Positive relationship 6. Admissions for Respiratory Tract Infections using AIC, additional 2 variables:

  • Social Health Needs - positive relationship
  • Physically active Adults

Chosen Model

# Reception Model Chosen with regsubsets using BIC
Model4 <- lm(reception_prevalence_of_obesity ~ deprivation_score + obesity_qof_prevalence + inner_london + `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + breastfeeding_initiation + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year, data = model_data2)
summary(Model4)
## 
## Call:
## lm(formula = reception_prevalence_of_obesity ~ deprivation_score + 
##     obesity_qof_prevalence + inner_london + `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + 
##     breastfeeding_initiation + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year, 
##     data = model_data2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9144 -0.7459  0.0329  0.7097  3.7761 
## 
## Coefficients:
##                                                                            Estimate
## (Intercept)                                                               0.0725200
## deprivation_score                                                         0.1122737
## obesity_qof_prevalence                                                    0.2272590
## inner_london                                                              1.4587311
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`   0.1401095
## breastfeeding_initiation                                                  0.0350890
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year -0.0010747
##                                                                          Std. Error
## (Intercept)                                                               1.7274942
## deprivation_score                                                         0.0148745
## obesity_qof_prevalence                                                    0.0649953
## inner_london                                                              0.4318203
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`   0.0471724
## breastfeeding_initiation                                                  0.0146496
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year  0.0004835
##                                                                          t value
## (Intercept)                                                                0.042
## deprivation_score                                                          7.548
## obesity_qof_prevalence                                                     3.497
## inner_london                                                               3.378
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`    2.970
## breastfeeding_initiation                                                   2.395
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year  -2.223
##                                                                          Pr(>|t|)
## (Intercept)                                                              0.966574
## deprivation_score                                                           5e-12
## obesity_qof_prevalence                                                   0.000631
## inner_london                                                             0.000944
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`  0.003500
## breastfeeding_initiation                                                 0.017924
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year 0.027817
##                                                                             
## (Intercept)                                                                 
## deprivation_score                                                        ***
## obesity_qof_prevalence                                                   ***
## inner_london                                                             ***
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`  ** 
## breastfeeding_initiation                                                 *  
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.086 on 141 degrees of freedom
## Multiple R-squared:  0.5765, Adjusted R-squared:  0.5585 
## F-statistic:    32 on 6 and 141 DF,  p-value: < 2.2e-16
# model plots
plot(Model4)

Points of interest
38: Reading
102: Enfield
112: Richmond upon Thames

Comparing Y6 and Reception

Reception
1. Deprivation Score
2. Inner London Indicator
3. % of population aged 0-15
4. Breastfeeding at Birth
5. Obesity in Adults 18+ - Positive relationship
6. Admissions for Respiratory Tract Infections

Year 6
1. deprivation score
2. % of population aged 0-15
3. Inner London Indicator
4. Physically active adults
5. Admissions for Respiratory Tract Infections
6. Admissions for Respiratory Tract Infections

  • The models contain laregely the same indicators
  • The reception model contains the variable breastfeeding at birth which does not feature in the year 6 model
  • The year 6 model contains the variable physically active adults, which does not feature in the reception model
  • The AIC Reception model also includes the variable Social Health Needs that does not feature in either of the Year 6 models.

The Reception model has an adjusted r squared value of 0.559 for the 6 variable model where as the year 6 model has an adjusted r squared value of 0.762 for the same number of variables. This shows that the Year 6 model is fitting the data much better than the reception model.

Evaluation

  • Would like to repeat the model for 2016/17 Y6 and 2009/10 Reception and see if time makes a difference

  • ‘ignoring’ time when selecting the explanatory variables (how much of a time delay is there in certain variables affecting others?)

  • Introduce more predictor variables or different time choices more suited to modelling reception children

  • More/different predictors, not just ones on fingertips (e.g. number of fast food resturants in local area, average shopping basket, london affect)

  • Compare specific children through time